AITopics | domain name

Collaborating Authors

domain name

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Real-PGDN: A Two-level Classification Method for Full-Process Recognition of Newly Registered Pornographic and Gambling Domain Names

Wang, Hao, Wang, Yingshuo, Gan, Junang, Cheng, Yanan, Zhang, Jinshuai

arXiv.org Artificial IntelligenceDec-1-2025

Online pornography and gambling have consistently posed regulatory challenges for governments, threatening both personal assets and privacy. Therefore, it is imperative to research the classification of the newly registered Pornographic and Gambling Domain Names (PGDN). However, scholarly investigation into this topic is limited. Previous efforts in PGDN classification pursue high accuracy using ideal sample data, while others employ up-to-date data from real-world scenarios but achieve lower classification accuracy. This paper introduces the Real-PGDN method, which accomplishes a complete process of timely and comprehensive real-data crawling, feature extraction with feature-missing tolerance, precise PGDN classification, and assessment of application effects in actual scenarios. Our two-level classifier, which integrates CoSENT (BERT-based), Multilayer Perceptron (MLP), and traditional classification algorithms, achieves a 97.88% precision. The research process amasses the NRD2024 dataset, which contains continuous detection information over 20 days for 1,500,000 newly registered domain names across 6 directions. Results from our case study demonstrate that this method also maintains a forecast precision of over 70% for PGDN that are delayed in usage after registration.

artificial intelligence, domain name, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2511.22215

Country: Asia > China > Heilongjiang Province (0.14)

Genre: Research Report > New Finding (0.93)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.49)

Add feedback

SQLForge: Synthesizing Reliable and Diverse Data to Enhance Text-to-SQL Reasoning in LLMs

Guo, Yu, Jin, Dong, Ye, Shenghao, Chen, Shuangwu, Yang, Jian, Tan, Xiaobin

arXiv.org Artificial IntelligenceSep-23-2025

Large Language models (LLMs) have demonstrated significant potential in text-to-SQL reasoning tasks, yet a substantial performance gap persists between existing open-source models and their closed-source counterparts. In this paper, we introduce SQLForge, a novel approach for synthesizing reliable and diverse data to enhance text-to-SQL reasoning in LLMs. We improve data reliability through SQL syntax constraints and SQL-to-question reverse translation, ensuring data logic at both structural and semantic levels. We also propose an SQL template enrichment and iterative data domain exploration mechanism to boost data diversity. Building on the augmented data, we fine-tune a variety of open-source models with different architectures and parameter sizes, resulting in a family of models termed SQLForge-LM. SQLForge-LM achieves the state-of-the-art performance on the widely recognized Spider and BIRD benchmarks among the open-source models. Specifically, SQLForge-LM achieves EX accuracy of 85.7% on Spider Dev and 59.8% on BIRD Dev, significantly narrowing the performance gap with closed-source methods.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.18653/v1/2025.findings-acl.443

2505.13725

Country: Asia (0.28)

Genre: Research Report (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

DNS Tunneling: Threat Landscape and Improved Detection Solutions

Amirov, Novruz, Isik, Baran, Tuncer, Bilal Ihsan, Bahtiyar, Serif

arXiv.org Artificial IntelligenceJul-15-2025

--Detecting DNS tunneling is a significant challenge in cybersecurity due to its capacity to hide harmful actions within DNS traffic that appears to be normal and legitimate. Traditional detection methods based on rule-based approaches or signature matching are often insufficient to accurately identify such covert communication channels. This paper addresses the necessity of machine learning methods for effective DNS tunneling detection. We propose a novel approach to detect DNS tunneling. Through the combination of advanced machine learning algorithms and the analysis of various features extracted from DNS traffic, our aim is to provide an accurate DNS tunneling detection model. A. About the Subject The Domain Name System (DNS) is a hierarchical and decentralized naming system crucial for internet functionality [1]. As a core component of internet infrastructure, DNS is used in nearly every online transaction, making it a prime target for a variety of cyber threats. Due to its foundational role and widespread trust, DNS is vulnerable to several types of attacks, threat landscape can be seen in [2], such as cache poisoning, amplification and DoS attacks, and phishing attacks. These vulnerabilities offer attackers multiple possibilities to disrupt or manipulate internet traffic.

artificial intelligence, dns, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2507.10267

Country:

Europe (0.28)
Asia (0.28)

Genre: Research Report > Promising Solution (0.34)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (0.68)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
(3 more...)

Add feedback

Learning Moderately Input-Sensitive Functions: A Case Study in QR Code Decoding

Yoda, Kazuki, Kawamoto, Kazuhiko, Kera, Hiroshi

arXiv.org Artificial IntelligenceJun-26-2025

The hardness of learning a function that attains a target task relates to its input-sensitivity. For example, image classification tasks are input-insensitive as minor corruptions should not affect the classification results, whereas arithmetic and symbolic computation, which have been recently attracting interest, are highly input-sensitive as each input variable connects to the computation results. This study presents the first learning-based Quick Response (QR) code decoding and investigates learning functions of medium sensitivity. Our experiments reveal that Transformers can successfully decode QR codes, even beyond the theoretical error-correction limit, by learning the structure of embedded texts. They generalize from English-rich training data to other languages and even random strings. Moreover, we observe that the Transformer-based QR decoder focuses on data bits while ignoring error-correction bits, suggesting a decoding mechanism distinct from standard QR code readers.

artificial intelligence, machine learning, transformer, (18 more...)

arXiv.org Artificial Intelligence

2506.20305

Genre: Research Report > New Finding (0.93)

Industry:

Information Technology (0.46)
Automobiles & Trucks (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Intelligent Detection of Non-Essential IoT Traffic on the Home Gateway

Palmese, Fabio, Mandalari, Anna Maria, Haddadi, Hamed, Redondi, Alessandro Enrico Cesare

arXiv.org Artificial IntelligenceApr-29-2025

The rapid expansion of Internet of Things (IoT) devices, particularly in smart home environments, has introduced considerable security and privacy concerns due to their persistent connectivity and interaction with cloud services. Despite advancements in IoT security, effective privacy measures remain uncovered, with existing solutions often relying on cloud-based threat detection that exposes sensitive data or outdated allow-lists that inadequately restrict non-essential network traffic. This work presents ML-IoTrim, a system for detecting and mitigating non-essential IoT traffic (i.e., not influencing the device operations) by analyzing network behavior at the edge, leveraging Machine Learning to classify network destinations. Our approach includes building a labeled dataset based on IoT device behavior and employing a feature-extraction pipeline to enable a binary classification of essential vs. non-essential network destinations. We test our framework in a consumer smart home setup with IoT devices from five categories, demonstrating that the model can accurately identify and block non-essential traffic, including previously unseen destinations, without relying on traditional allow-lists. We implement our solution on a home access point, showing the framework has strong potential for scalable deployment, supporting near-real-time traffic classification in large-scale IoT environments with hundreds of devices. This research advances privacy-aware traffic control in smart homes, paving the way for future developments in IoT device privacy.

artificial intelligence, machine learning, traffic, (19 more...)

arXiv.org Artificial Intelligence

2504.18571

Country: Europe > Italy (0.14)

Genre: Research Report (0.50)

Industry:

Information Technology > Smart Houses & Appliances (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Internet of Things (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

Training Large Language Models for Advanced Typosquatting Detection

Welch, Jackson

arXiv.org Artificial IntelligenceMar-28-2025

Since the early days of the commercial internet, typosquatting has exploited the simplest of human errors, mistyping a URL, to serve as a potent tool for cybercriminals. Initially observed as an opportunistic tactic, typosquatting involves registering domain names that closely match that of reputable brands, thereby redirecting users to counterfeit websites. This has evolved into a sophisticated form of cyberattack used to conduct phishing schemes, distribute malware, and harvest sensitive data. Now with billions of domain names and TLDs in circulation, the scale and impact of typosquatting have grown exponentially. This poses significant risks to individuals, businesses, and national cybersecurity infrastructure. This whitepaper explores how emerging large language model (LLM) techniques can enhance the detection of typosquatting attempts, ultimately fortifying defenses against one of the internet's most enduring cyber threats. Cybercriminals employ various domain squatting techniques to deceive users and bypass traditional security measures. These methods include but not limited to: Character Substitution: These attacks swap similar looking characters like replacing "o" with "0" in go0gle[.]com to trick users into believing they are visiting the legitimate site. Omission or Addition: This method involves removing or adding a character, creating domains such as gogle[.]com

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2503.22406

Genre: Research Report (0.64)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (0.57)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback

Comprehensive Survey on Adversarial Examples in Cybersecurity: Impacts, Challenges, and Mitigation Strategies

Li, Li

arXiv.org Artificial IntelligenceDec-15-2024

Deep learning (DL) has significantly transformed cybersecurity, enabling advancements in malware detection, botnet identification, intrusion detection, user authentication, and encrypted traffic analysis. However, the rise of adversarial examples (AE) poses a critical challenge to the robustness and reliability of DL-based systems. These subtle, crafted perturbations can deceive models, leading to severe consequences like misclassification and system vulnerabilities. This paper provides a comprehensive review of the impact of AE attacks on key cybersecurity applications, highlighting both their theoretical and practical implications. We systematically examine the methods used to generate adversarial examples, their specific effects across various domains, and the inherent trade-offs attackers face between efficacy and resource efficiency. Additionally, we explore recent advancements in defense mechanisms, including gradient masking, adversarial training, and detection techniques, evaluating their potential to enhance model resilience. By summarizing cutting-edge research, this study aims to bridge the gap between adversarial research and practical security applications, offering insights to fortify the adoption of DL solutions in cybersecurity.

artificial intelligence, classifier, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2412.12217

Country:

Oceania > Australia > Australian Capital Territory > Canberra (0.04)
North America > United States > Virginia (0.04)
Europe > Norway > Eastern Norway > Oslo (0.04)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)
Research Report > Promising Solution (0.67)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

LLMs for Domain Generation Algorithm Detection

La O, Reynier Leyva, Catania, Carlos A., Parlanti, Tatiana

arXiv.org Artificial IntelligenceNov-5-2024

We perform a detailed evaluation of two important techniques: In-Context Learning (ICL) and Supervised Fine-Tuning (SFT), showing how they can improve detection. SFT increases performance by using domain-specific data, whereas ICL helps the detection model to quickly adapt to new threats without requiring much retraining. We use Meta's Llama3 8B model, on a custom dataset with 68 malware families and normal domains, covering several hard-to-detect schemes, including recent word-based DGAs. Results proved that LLM-based methods can achieve competitive results in DGA detection. In particular, the SFT-based LLM DGA detector outperforms state-of-the-art models using attention layers, achieving 94% accuracy with a 4% false positive rate (FPR) and excelling at detecting word-based DGA domains.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2411.03307

Country:

South America > Argentina > Cuyo > Mendoza Province > Mendoza (0.04)
North America > United States > New York (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

DomURLs_BERT: Pre-trained BERT-based Model for Malicious Domains and URLs Detection and Classification

Mahdaouy, Abdelkader El, Lamsiyah, Salima, Idrissi, Meryem Janati, Alami, Hamza, Yartaoui, Zakaria, Berrada, Ismail

arXiv.org Artificial IntelligenceSep-13-2024

Detecting and classifying suspicious or malicious domain names and URLs is fundamental task in cybersecurity. To leverage such indicators of compromise, cybersecurity vendors and practitioners often maintain and update blacklists of known malicious domains and URLs. However, blacklists frequently fail to identify emerging and obfuscated threats. Over the past few decades, there has been significant interest in developing machine learning models that automatically detect malicious domains and URLs, addressing the limitations of blacklists maintenance and updates. In this paper, we introduce DomURLs_BERT, a pre-trained BERT-based encoder adapted for detecting and classifying suspicious/malicious domains and URLs. DomURLs_BERT is pre-trained using the Masked Language Modeling (MLM) objective on a large multilingual corpus of URLs, domain names, and Domain Generation Algorithms (DGA) dataset. In order to assess the performance of DomURLs_BERT, we have conducted experiments on several binary and multi-class classification tasks involving domain names and URLs, covering phishing, malware, DGA, and DNS tunneling. The evaluations results show that the proposed encoder outperforms state-of-the-art character-based deep learning models and cybersecurity-focused BERT models across multiple tasks and datasets. The pre-training dataset, the pre-trained DomURLs_BERT encoder, and the experiments source code are publicly available.

classification, dataset, domain name, (13 more...)

arXiv.org Artificial Intelligence

2409.09143

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Asia > China > Hong Kong (0.04)
Africa > Middle East > Morocco > Fès-Meknès Region > Fez (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Large Generative Graph Models

Wang, Yu, Rossi, Ryan A., Park, Namyong, Chen, Huiyuan, Ahmed, Nesreen K., Trivedi, Puja, Dernoncourt, Franck, Koutra, Danai, Derr, Tyler

arXiv.org Artificial IntelligenceJun-7-2024

Large Generative Models (LGMs) such as GPT, Stable Diffusion, Sora, and Suno are trained on a huge amount of language corpus, images, videos, and audio that are extremely diverse from numerous domains. This training paradigm over diverse well-curated data lies at the heart of generating creative and sensible content. However, all previous graph generative models (e.g., GraphRNN, MDVAE, MoFlow, GDSS, and DiGress) have been trained only on one dataset each time, which cannot replicate the revolutionary success achieved by LGMs in other fields. To remedy this crucial gap, we propose a new class of graph generative model called Large Graph Generative Model (LGGM) that is trained on a large corpus of graphs (over 5000 graphs) from 13 different domains. We empirically demonstrate that the pre-trained LGGM has superior zero-shot generative capability to existing graph generative models. Furthermore, our pre-trained LGGM can be easily fine-tuned with graphs from target domains and demonstrate even better performance than those directly trained from scratch, behaving as a solid starting point for real-world customization. Inspired by Stable Diffusion, we further equip LGGM with the capability to generate graphs given text prompts (Text-to-Graph), such as the description of the network name and domain (i.e., "The power-1138-bus graph represents a network of buses in a power distribution system."), and network statistics (i.e., "The graph has a low average degree, suitable for modeling social media interactions."). This Text-to-Graph capability integrates the extensive world knowledge in the underlying language model, offering users fine-grained control of the generated graphs. We release the code, the model checkpoint, and the datasets at https://lggm-lg.github.io/.

digress 0, graph, lggm-ft 0, (13 more...)

arXiv.org Artificial Intelligence

2406.05109

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Missouri > Boone County > Columbia (0.04)
North America > United States > Michigan (0.04)
(3 more...)

Genre: Research Report > Promising Solution (0.48)

Industry:

Information Technology > Security & Privacy (0.67)
Energy > Power Industry (0.48)
Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback